Occupancy models

Silas Principe

What are occupancy models?

Occupancy models are used to help estimate true occupancy of a species […]. They can assist with accounting for imperfect detection of organisms in a study, and help us determine the true occupancy and the detection probabilities of the species at the site.

Because detecting wildlife species is not done with 100% accuracy, we can use occupancy models to help us determine the proportion of times that we don’t detect the species and either the species is truly not there or the species is there, and we just did not detect it.

https://kevintshoemaker.github.io/NRES-746/Occupancy.html

A simple simulation

As more sites are visited, the error accumulates and the difference between the true occupancy and the observed one increases.

Model

We can model this data using the spOccupancy package.

library(spOccupancy)

occ_formula <- ~ salinity + temperature
det_formula <- ~ (1 | observer)

n.samples <- 15000
n.report <- 1000

# Create data object
N <- length(unique(data$site))
J <- length(unique(data$visit))
y_mat <- matrix(data$Y, nrow = N, ncol = J, byrow = TRUE)
salinity_vec <- as.numeric(tapply(data$salinity, data$site, function(x) x[1]))
temp_vec      <- as.numeric(tapply(data$temperature, data$site, function(x) x[1]))
occ_covs_mat  <- cbind(salinity = salinity_vec, temperature = temp_vec)
observer_int <- as.integer(factor(data$observer))
observer_mat <- matrix(observer_int, nrow = N, ncol = J, byrow = TRUE)

stopifnot(all(dim(y_mat) == c(N, J)))
stopifnot(all(dim(occ_covs_mat)[1] == N))
stopifnot(all(dim(observer_mat) == c(N, J)))

# Build the data list for PGOcc
data_obj <- list(
  y = y_mat,
  occ.covs = occ_covs_mat,
  det.covs = list(observer = observer_mat)
)
str(data_obj)
# y should be 500 rows (1 each site) x 3 columns (1 each visit)
# occ.covs should be 500 rows (1 each site) for 2 variables
# det.covs should be 500 rows (1 each site) x 3 columns (1 each visit)
inits <- list(alpha = 0, beta = 0, sigma.sq.psi = 0.5, 
              sigma.sq.p = 0.5, z = apply(data_obj$y, 1, max, na.rm = TRUE), 
              sigma.sq = 1, phi = 3 / 0.5)

out <- PGOcc(
  occ.formula = occ_formula,
  det.formula = det_formula,
  data = data_obj,
  n.samples = n.samples,
  n.burn = 4000,
  n.thin = 1,
  n.chains = 4,
  n.omp.threads = 1,
  verbose = TRUE,
  n.report = n.report,
  inits = inits
)

#reference values
# beta0 = -2, # Intercept for eco proc
# beta1 = 0.3, # Beta for salinity
# beta2 = 0.1, # beta for temperature
# alpha0 = 0.6, # baseline detection intercept
# sigma_obs = 0.5, # SD of random observer effects

summary(out)

Call:
PGOcc(occ.formula = occ_formula, det.formula = det_formula, data = data_obj, 
    inits = inits, n.samples = n.samples, n.omp.threads = 1, 
    verbose = TRUE, n.report = n.report, n.burn = 4000, n.thin = 1, 
    n.chains = 4)

Samples per Chain: 15000
Burn-in: 4000
Thinning Rate: 1
Number of Chains: 4
Total Posterior Samples: 44000
Run Time (min): 0.3924

Occurrence (logit scale): 
               Mean     SD    2.5%     50%   97.5%   Rhat   ESS
(Intercept) -1.7230 0.6690 -3.0511 -1.7231 -0.4271 1.0000 18224
salinity     0.3211 0.0514  0.2240  0.3195  0.4254 1.0001 10495
temperature  0.0798 0.0293  0.0229  0.0795  0.1377 1.0000 18345

Detection (logit scale): 
              Mean     SD   2.5%    50%  97.5%   Rhat  ESS
(Intercept) 0.8322 0.2562 0.3091 0.8301 1.3516 1.0138 1232

Detection Random Effect Variances (logit scale): 
           Mean     SD   2.5%    50%  97.5%   Rhat  ESS
observer 0.3613 0.5284 0.0608 0.2292 1.4605 1.0022 6997

We can then compare the fitted values with the true ones to see how good our model is on identifying the simulated values.

occ <- predict(out, as.matrix(cbind(intercept = 1, data_obj$occ.covs)), type = "occupancy", ignore.RE = T)
occ_quants <- apply(occ$psi.0.samples, 2, quantile, c(0.025, 0.5, 0.975))
occ_med <- occ_quants[2,]
true_occ <- plogis(-2 + 0.3 * data_obj$occ.covs[,1] + 0.1 * data_obj$occ.covs[,2])

plot(true_occ,
     occ_med, pch = 21, bg = "lightblue",
     xlab = "Simulated", ylab = "Estimated")
abline(lm(occ_med ~ true_occ), col = "grey")

det <- predict(out, as.matrix(cbind(intercept = 1, observer = data$observer)), type = "detection")
det_quants <- apply(det$p.0.samples, 2, quantile, c(0.025, 0.5, 0.975))
det_med <- unname(apply(det_quants, 2, \(x) x[2]))

true_p <- plogis(0.6 + data$observer_eff)
plot(true_p, det_med, pch = 21, bg = "orange",
     xlab = "True detection probability", ylab = "Estimated detection probability")
abline(lm(det_med ~ true_p), col = "grey")

Just a reminder

A true occupancy model depends on multiple visits to the same site to be able to estimate the detection part of the model. There are, however, some works using single visit occupancy models.